Modern Pathology — Latest Matching Preprints

1

NKG2C Improves Diagnostic Specificity of NK Cell Receptor Restriction by Identifying Non-Neoplastic Adaptive NK Cell Clones

Wilk, A. J.; Gitana, G.; Oak, J.

2025-11-22 pathology 10.1101/2025.11.18.25340429 medRxiv

Top 0.1%

62.8%

Show abstract

Natural killer (NK) cell neoplasms are a diverse group of entities with often nonspecific clinical presentations, making immunophenotyping essential for diagnosis. Immunophenotyping by flow cytometry can identify clonal NK cell populations by detecting restricted expression patterns of NK cell receptors such as killer cell immunoglobulin-like receptors (KIRs). However, reactive NK cells may also demonstrate KIR restriction through expansion of self-KIR-expressing NK cells, leading to identification of NK clones of uncertain significance (NK-CUS). A well-described reactive NK subset, termed "adaptive" NK cells, arises in response to cytomegalovirus (CMV) infection or reactivation, often appears KIR-restricted, and is defined by coexpression of CD57 and the activating receptor NKG2C. Because CMV reactivation is common among patients undergoing evaluation for hematolymphoid malignancy, we hypothesized that NK-CUS may frequently correspond to this non-neoplastic adaptive NK cell subset. Here, we describe a flow cytometry panel for immunophenotypic characterization of cytotoxic lymphocytes that includes NKG2C, enabling detection of non-neoplastic adaptive NK cells. We show that NK-CUS frequently represent reactive NKG2C+ adaptive NK cells. We describe several cases that meet diagnostic criteria for NK-large granular lymphocytic leukemia (NK-LGLL) and demonstrate that the NK cell clones are non-neoplastic NKG2C+ adaptive NK cells arising in the setting of CMV viremia. Further, we show that NKG2C expression is uncommon by cytotoxic lymphocyte malignancies with recurrent molecular or cytogenetic abnormalities. Collectively, we demonstrate that NKG2C has a high specificity for reactive NK cell populations, and its inclusion in NK cell immunophenotyping panels is a useful strategy to more reliably distinguish between neoplastic and reactive NK cell populations.

2

iQC: machine-learning-driven prediction of surgical procedure uncovers systematic confounds of cancer whole slide images in specific medical centers

Schaumberg, A. J.; Lewis, M. S.; Nazarian, R.; Wadhwa, A.; Kane, N.; Turner, G.; Karnam, P.; Devineni, P.; Wolfe, N.; Kintner, R.; Rettig, M. B.; Knudsen, B. S.; Garraway, I. P.; Pyarajan, S.

2023-12-13 pathology 10.1101/2023.09.19.23295798 medRxiv

Top 0.1%

45.4%

Show abstract

ProblemThe past decades have yielded an explosion of research using artificial intelligence for cancer detection and diagnosis in the field of computational pathology. Yet, an often unspoken assumption of this research is that a glass microscopy slide faithfully represents the underlying disease. Here we show systematic failure modes may dominate the slides digitized from a given medical center, such that neither the whole slide images nor the glass slides are suitable for rendering a diagnosis. MethodsWe quantitatively define high quality data as a set of whole slide images where the type of surgery the patient received may be accurately predicted by an automated system such as ours, called "iQC". We find iQC accurately distinguished biopsies from nonbiopsies, e.g. prostatectomies or transurethral resections (TURPs, a.k.a. prostate chips), only when the data qualitatively appeared to be high quality, e.g. vibrant histopathology stains and minimal artifacts. Crucially, prostate needle biopsies appear as thin strands of tissue, whereas prostatectomies and TURPs appear as larger rectangular blocks of tissue. Therefore, when the data are of high quality, iQC (i) accurately classifies pixels as tissue, (ii) accurately generates statistics that describe the distribution of tissue in a slide, and (iii)accurately predicts surgical procedure from said statistics. We additionally compare our "iQC" to "HistoQC", both in terms of how many slides are excluded and how much tissue is identified in the slides. ResultsWhile we do not control any medical centers protocols for making or storing slides, we developed the iQC tool to hold all medical centers and datasets to the same objective standard of quality. We validate this standard across five Veterans Affairs Medical Centers (VAMCs) and the Automated Gleason Grading Challenge (AGGC) 2022 public dataset. For our surgical procedure prediction task, we report an Area Under Receiver Operating Characteristic (AUROC) of 0.9966-1.000 at the VAMCs that consistently produce high quality data and AUROC of 0.9824 for the AGGC dataset. In contrast, we report an AUROC of 0.7115 at the VAMC that consistently produced poor quality data. An attending pathologist determined poor data quality was likely driven by faded histopathology stains and protocol differences among VAMCs. Corroborating this, iQCs novel stain strength statistic finds this institution has significantly weaker stains (p < 2.2 x 10-16, two-tailed Wilcoxon rank-sum test) than the VAMC that contributed the most slides, and this stain strength difference is a large effect (Cohens d = 1.208). In addition to accurately detecting the distribution of tissue in slides, we find iQC recommends only 2 of 3736 VAMC slides (0.005%) be reviewed for inadequate tissue. With its default configuration file, HistoQC excluded 89.9% of VAMC slides because tissue was not detected in these slides. With our customized configuration file for HistoQC, we reduced this to 16.7% of VAMC slides. Strikingly, the default configuration of HistoQC included 94.0% of the 1172 prostate cancer slides from The Cancer Genome Atlas (TCGA), which may suggest HistoQC defaults were calibrated against TCGA data but this calibration did not generalize well to non-TCGA datasets. For VAMC and TCGA, we find a negligible to small degree of agreement in the include/exclude status of slides, which may suggest iQC and HistoQC are not equivalent. ConclusionOur surgical procedure prediction AUROC may be a quantitative indicator positively associated with high data quality at a medical center or for a specific dataset. We find iQC accurately identifies tissue in slides and excludes few slides, unless the data are poor quality. To produce high quality data, we recommend producing slides using robotics or other forms of automation whenever possible. We recommend scanning slides digitally before the glass slide has time to develop signs of age, e.g faded stains and acrylamide bubbles. We recommend using high-quality reagents to stain and mount slides, which may slow aging. We recommend protecting stored slides from ultraviolet light, from humidity, and from changes in temperature. To our knowledge, iQC is the first automated system in computational pathology that validates data quality against objective evidence, e.g. surgical procedure data available in the EHR or LIMS, which requires zero efforts or annotations from anatomic pathologists. Please see https://github.com/schaumba/iqc and https://doi.org/10.17605/OSF.IO/AVD3Z for instructions and updates.

3

Evaluating Large Language Models in Interpreting Cervical Cytology

Geetha, S. D.

2025-11-06 pathology 10.1101/2025.11.04.25339501 medRxiv

Top 0.1%

42.6%

Show abstract

BackgroundLarge language models (LLMs) have shown promise in medical imaging, but their utility in cytology remains underexplored. This study evaluates GPT-5 and Gemini 2.5 Pro for Pap smear interpretation. MethodsDigital cervical Pap smear images of 100 cases were obtained from the Hologic Education Site, with Hologic diagnoses considered the gold standard. Representative images were uploaded into GPT-5 and Gemini 2.5 Pro and prompted to provide a diagnosis based on the Third Edition of the Bethesda System for Reporting Cervical Cytopathology. Cases with infectious organisms were assessed using additional images. Concordance was evaluated at exact diagnosis and clinical management groupings, wherein diagnoses with similar management implications were grouped. Sensitivity and specificity for abnormal cytology were also calculated. ResultsConcordance of both LLMs for exact diagnostic matches were comparable (GPT-5: 47%, Gemini: 48%) and increased to 66% for clinical management grouping. GPT-5 performed best for low-grade squamous intraepithelial lesions (75%), whereas Gemini 2.5 Pro showed the highest concordance in the high-grade squamous intraepithelial lesion (HSIL) category (82%), although this was largely attributable to its strong tendency to overcall cases as HSIL. Sensitivity for detecting abnormal cytology was 74% for GPT-5 and 84% for Gemini, with specificity of 74% and 71%, respectively. GPT-5 better identified glandular lesions, while Gemini detected organisms more accurately (71% vs. 20%). ConclusionsCurrent LLMs demonstrate moderate ability to identify cytologic abnormalities but are not yet reliable for independent Pap smear interpretation. Targeted fine-tuning, prompt optimization, and cytology-specific training could enhance their utility as adjunctive tools in cytology workflows.

4

A Definitive Tcrbeta1/ Tcrbeta2 Antibody Pair For Determining T-Cell Monotypia As A Surrogate For Clonality In Lymphoma Diagnosis In Formalin Fixed Paraffin Embedded Material

Kaistha, A.; Situ, J. J.; Evans, S. C.; Ashton-Key, M.; Ogg, G.; Soilleux, E. J.

2026-02-17 pathology 10.64898/2026.02.13.26346202 medRxiv

Top 0.1%

40.9%

Show abstract

T-cell lymphomas are often histologically indistinguishable from benign T-cell infiltrates. Clonality testing is frequently required for diagnosis. It lacks the spatial context and is slow and expensive, relying on complex, multiplexed PCR reactions, interpreted by experienced scientists or pathologists. We previously published details of a pair of highly specific monoclonal antibodies against the two alternatively used, but very similar, T-cell receptor {beta} constant regions, TCR{beta}1 and TCR{beta}2. We demonstrated the feasibility of immunohistochemical detection of TCR{beta}1 and TCR{beta}2 in formalin-fixed, paraffin-embedded (FFPE) tissue as a novel diagnostic strategy for T-cell lymphomas. Here we validate an improved pairing of TCR{beta}1/2 rabbit monoclonal antibodies, and demonstrate their utility for single and double immunostaining, including with a chimeric mouse anti-TCR{beta}2 antibody. Finally, we show that this staining is amenable to automated cell counting, permitting accurate calculation of the TCR{beta}2:TCR{beta}1 ratio.

5

Robust, credible, and interpretable AI-based histopathological prostate cancer grading

Westhaeusser, F.; Fuhlert, P.; Dietrich, E.; Lennartz, M.; Khatri, R.; Kaiser, N.; Roebeck, P.; Buelow, R.; von Stillfried, S.; Witte, A.; Ladjevardi, S.; Drotte, A.; Severgardh, P.; Baumbach, J.; Puelles, V. G.; Haeggman, M.; Brehler, M.; Boor, P.; Walhagen, P.; Dragomir, A.; Busch, C.; Graefen, M.; Bengtsson, E.; Sauter, G.; Zimmermann, M.; Bonn, S.

2024-07-10 pathology 10.1101/2024.07.09.24310082 medRxiv

Top 0.1%

40.9%

Show abstract

BackgroundProstate cancer (PCa) is among the most common cancers in men and its diagnosis requires the histopathological evaluation of biopsies by human experts. While several recent artificial intelligence-based (AI) approaches have reached human expert-level PCa grading, they often display significantly reduced performance on external datasets. This reduced performance can be caused by variations in sample preparation, for instance the staining protocol, section thickness, or scanner used. Another limiting factor of contemporary AI-based PCa grading is the prediction of ISUP grades, which leads to the perpetuation of human annotation errors. MethodsWe developed the prostate cancer aggressiveness index (PCAI), an AI-based PCa detection and grading framework that is trained on objective patient outcome, rather than subjective ISUP grades. We designed PCAI as a clinical application, containing algorithmic modules that offer robustness to data variation, medical interpretability, and a measure of prediction confidence. To train and evaluate PCAI, we generated a multicentric, retrospective, observational trial consisting of six cohorts with 25,591 patients, 83,864 images, and 5 years of median follow-up from 5 different centers and 3 countries. This includes a high-variance dataset of 8,157 patients and 28,236 images with variations in sample thickness, staining protocol, and scanner, allowing for the systematic evaluation and optimization of model robustness to data variation. The performance of PCAI was assessed on three external test cohorts from two countries, comprising 2,255 patients and 9,437 images. FindingsUsing our high-variance datasets, we show how differences in sample processing, particularly slide thickness and staining time, significantly reduce the performance of AI-based PCa grading by up to 6.2 percentage points in the concordance index (C-index). We show how a select set of algorithmic improvements, including domain adversarial training, conferred robustness to data variation, interpretability, and a measure of credibility to PCAI. These changes lead to significant prediction improvement across two biopsy cohorts and one TMA cohort, systematically exceeding expert ISUP grading in C-index and AUROC by up to 22 percentage points. InterpretationData variation poses serious risks for AI-based histopathological PCa grading, even when models are trained on large datasets. Algorithmic improvements for model robustness, interpretability, credibility, and training on high-variance data as well as outcome-based severity prediction gives rise to robust models with above ISUP-level PCa grading performance.

6

Self-Supervised Learning Can Distinguish Myelodysplastic Neoplasms from Clinical Mimics Using Bone Marrow Biopsies

Mehrtash, V.; Le, H.; Jafarzadeh, B.; Loghavi, S.; Garcia-Manero, G.; Tsirigos, A.; Park, C. Y.

2025-02-21 pathology 10.1101/2025.02.17.25322075 medRxiv

Top 0.1%

40.9%

Show abstract

The diagnosis of myelodysplastic neoplasms (MDS) requires examination of the bone marrow for morphologic evidence of dysplasia. We sought to determine if a self-supervised learning (SSL) AI image analysis approach may be utilized to reliably distinguish MDS from its clinically relevant mimics using bone marrow biopsies (BMBx). Whole slide images (WSIs) of H&E- and reticulin-stained BMBx sections from 243 unique patients (89 MDS, 55 non-MDS cytopenic controls [NMCC], and 99 negative control [NC] cases) were segmented into tiles and analyzed. These tiles were then processed using the Barlow Twins SSL model to generate histomorphologic phenotype clusters (HPCs). Review of the HPCs revealed the clusters enriched in MDS captured known histopathologic features of MDS including hypercellularity, dysplastic and clustered megakaryocytes, increased immature hematopoietic cells, increased vascularity, fibrosis, and cell streaming patterns. Assessment of 95 MDS BMBx images from a second institution showed consistent HPC enrichment patterns, validating the robustness of the model. The trained ensemble model using H&E- and reticulin-stained slides distinguished MDS from NCs with an AUC of 0.89, and from age-matched, NMCCs with an AUC of 0.84. These findings demonstrate the potential of SSL approaches to capture diagnostically relevant morphologic patterns and to improve the reproducibility of MDS diagnosis.

7

Improving the diagnosis and classification of Ph-negative myeloproliferative neoplasms through deep phenotyping

Sirinukunwattana, K.; Aberdeen, A.; Theissen, H.; Sousos, N.; Psaila, B.; Mead, A.; Turner, G.; Rees, G.; Rittscher, J.; Royston, D.

2019-09-11 pathology 10.1101/762013 medRxiv

Top 0.1%

35.6%

Show abstract

Myeloproliferative neoplasms (MPNs) are clonal disorders characterized by excessive proliferation of myeloid lineages. Accurate classification and appropriate management of MPNs requires integration of clinical, morphological and genetic findings. Despite major advances in understanding the molecular and genetic basis, morphological assessment of the bone marrow trephine (BMT) remains paramount in differentiating between MPN subtypes and reactive conditions. However, morphological assessment is heavily constrained by a reliance on subjective, qualitative and poorly reproducible criteria. To address this, we have developed a machine-learning strategy for the automated identification and quantitative analysis of megakaryocyte morphology using clinical BMT samples. Using a sample cohort of recently diagnosed or established ET (n = 48) and reactive control cases (n = 42) we demonstrated a high predictive accuracy (AUC = 0.95) of automated tissue ET diagnosis based upon these specific megakaryocyte phenotypes. These separate morphological phenotypes showed evidence of specific genotype associations, which offers promise that an automated cell phenotyping approach may be of clinical diagnostic utility as an adjunct to standard genetic and molecular tests. This has great potential to assist in the routine assessment of newly diagnosed or suspected MPN patients and those undergoing treatment / clinical follow-up. The extraction of quantitative morphological data from BMT sections will also have value in the assessment of new therapeutic strategies directed towards the bone marrow microenvironment and can provide clinicians and researchers with objective, quantitative data without significant demands upon current routine specimen workflows.

8

Platelets Outperform Leukocytes in Transcriptomic Liquid Biopsy Profiling of Myeloproliferative Neoplasms

Shen, Z.; Sawalkar, A.; Wu, J.; Natu, V.; Rowley, J.; T. Rondina, M.; Krishnan, A.

2026-04-01 pathology 10.64898/2026.03.30.714941 medRxiv

Top 0.1%

34.5%

Show abstract

Myeloproliferative neoplasms (MPNs) are characterized by progressive myelofibrosis that drives morbidity and mortality. Liquid biopsy approaches to noninvasively monitor fibrotic progression remain limited. We performed comparative transcriptomic profiling of CD45-depleted platelet-enriched and CD45+ leukocyte-enriched fractions from matched peripheral blood samples of 76 individuals (27 primary myelofibrosis, 17 polycythemia vera, 14 essential thrombocythemia, 18 healthy controls). Platelet RNA sequencing was performed in 2018-2020 on Illumina HiSeq 4000, while WBC RNA sequencing was conducted in 2023 on Illumina NovaSeq 6000 from cryopreserved CD45+ enriched fractions of specimens obtained at the identical time and from the same blood sample as the platelet RNA. Despite comparable library preparation protocols and higher sequencing depth in WBC samples, platelet transcriptomes exhibited 5.1-fold more differential expression in myelofibrosis (3,453 versus 681 genes, adjusted p<0.05, |log2FC|>1). Platelet signatures were enriched for proteostasis pathways including endoplasmic reticulum stress and unfolded protein response, reflecting megakaryocyte dysfunction in the fibrotic bone marrow niche. WBC signatures predominantly featured immune activation and proliferative pathways, indicating systemic inflammatory responses. Multinomial LASSO classification demonstrated superior performance of platelet-based models for myelofibrosis diagnosis (AUROC 0.85) compared to WBC-based (AUROC 0.77) or clinical models (AUROC 0.59). Combined platelet+WBC models did not improve performance (AUROC 0.80), indicating complementary but non-additive information. These findings establish platelet transcriptomic profiling as a superior noninvasive biomarker platform for monitoring myelofibrosis in MPNs, capturing megakaryocyte-driven fibrogenesis with greater sensitivity than peripheral leukocyte-based approaches. HighlightsUsing matched WBC and platelet RNA-seq from MPN patients, we identify myelofibrosis-associated transcriptomic signatures specifically enriched in platelets. Multinomial LASSO modeling highlights platelet-derived gene expression as a dominant and predictive biomarker of myelofibrosis, outperforming clinical parameters and WBC signatures. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=75 SRC="FIGDIR/small/714941v1_ufig1.gif" ALT="Figure 1"> View larger version (21K): org.highwire.dtl.DTLVardef@1d695aborg.highwire.dtl.DTLVardef@fc250forg.highwire.dtl.DTLVardef@1e52e8eorg.highwire.dtl.DTLVardef@15378e3_HPS_FORMAT_FIGEXP M_FIG C_FIG

9

The future of computational pathology: expectations regarding the anticipated role of artificial intelligence in pathology by 2030

Berbis, M. A.; Berbis, M. A.; McClintock, D. S.; Bychkov, A.; Cheng, J. Y.; Delahunt, B.; Egevad, L.; Eloy, C.; Farris, A. B.; Fraggetta, F.; Garcia del Moral, R.; Hartman, D. J.; Herrmann, M. D.; Hollemans, E.; Iczkowski, K. A.; Karsan, A.; Kriegsmann, M.; Lennerz, J. K.; Pantanowitz, L.; Salama, M. E.; Sinard, J.; Tuthill, M.; Van der Laak, J.; Williams, B.; Casado-Sanchez, C.; Casado-Sanchez, C.; Sanchez-Turrion, V.; Sanchez-Turrion, V.; Luna, A.; Aneiros-Fernandez, J.; Aneiros-Fernandez, J.; Shen, J.

2022-09-04 pathology 10.1101/2022.09.02.22279476 medRxiv

Top 0.1%

34.2%

Show abstract

BackgroundArtificial intelligence (AI) is rapidly fueling a fundamental transformation in the practice of pathology. However, AIs clinical integration remains challenging, with no AI algorithms to date enjoying routine adoption within typical anatomic pathology (AP) laboratories. This survey gathered current expert perspectives and expectations regarding the role of AI in AP from those with first-hand computational pathology and AI experience. MethodsPerspectives were solicited using the Delphi method from 24 subject matter experts between December 2020 and February 2021 regarding the anticipated role of AI in pathology by the year 2030. The study consisted of three consecutive rounds: 1) an open-ended, free response questionnaire generating a list of survey items; 2) a Likert-scale survey scored by experts and analyzed for consensus; and 3) a repeat survey of items not reaching consensus to obtain further expert consensus. FindingsConsensus opinions were reached on 141 of 180 survey items (78.3%). Experts agreed that AI would be routinely and impactfully used within AP laboratory and pathologist clinical workflows by 2030. High consensus was reached on 100 items across nine categories encompassing the impact of AI on (1) pathology key performance indicators (KPIs) and (2) the pathology workforce and specific tasks performed by (3) pathologists and (4) AP lab technicians, as well as (5) specific AI applications and their likelihood of routine use by 2030, (6) AIs role in integrated diagnostics, (7) pathology tasks likely to be fully automated using AI, and (8) regulatory/legal and (9) ethical aspects of AI integration in pathology. InterpretationThis is the first systematic consensus study detailing the expected short/mid-term impact of AI on pathology practice. These findings provide timely and relevant information regarding future care delivery in pathology and raise key practical, ethical, and legal challenges that must be addressed prior to AIs successful clinical implementation. FundingThis research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

10

Generating highly accurate pathology reports from gigapixel whole slide images with HistoGPT

Tran, M.; Schmidle, P.; Wagner, S. J.; Koch, V.; Lupperger, V.; Feuchtinger, A.; Boehner, A.; Kaczmarczyk, R.; Biedermann, T.; Eyerich, K.; Braun, S. A.; Peng, T.; Marr, C.

2024-03-18 pathology 10.1101/2024.03.15.24304211 medRxiv

Top 0.1%

34.1%

Show abstract

Histopathology is considered the reference standard for diagnosing the presence and nature of many malignancies, including cancer. However, analyzing tissue samples and writing pathology reports is time-consuming, labor-intensive, and non-standardized. To address this problem, we present HistoGPT, the first vision language model that simultaneously generates reports from multiple pathology images. It was trained on more than 15,000 whole slide images from over 6,000 dermatology patients with corresponding pathology reports. The generated reports match the quality of human-written reports, as confirmed by a variety of natural language processing metrics and domain expert evaluations. We show that HistoGPT generalizes to six geographically diverse cohorts and can predict tumor subtypes and tumor thickness in a zero-shot fashion. Our model demonstrates the potential of an AI assistant that supports pathologists in evaluating, reporting, and understanding routine dermatopathology cases.

11

Search and Retrieval in Dermatology Atlases of Histopathology Images for Risk Stratification of Cutaneous Squamous Cell Carcinoma

Alabtah, G.; Alsaafin, A.; Alfasly, S.; Shafique, A.; Hemati, S.; Choudhary, A.; Ravishankar, I. K.; DiCaudo, D.; Nelson, S. A.; Stockard, A.; Leibovit-Reiben, Z.; zhang, N.; Kalari, K.; Murphree, D.; Mangold, A.; Comfere, N.; Tizhoosh, H. R.

2026-01-06 pathology 10.64898/2026.01.02.26343356 medRxiv

Top 0.1%

33.5%

Show abstract

Cutaneous squamous cell carcinoma (cSCC) poses significant clinical challenges due to its rising incidence and potential for metastasis. Histopathologic risk stratification is further limited by substantial inter-observer variability. Unsupervised AI approaches based on content-based image retrieval offer scalable and interpretable decision support for diagnostic pathology. The objective of this study was to evaluate the use of image retrieval within histopathology atlases to stratify cSCC tumor differentiation from whole-slide images (WSIs), while comparing different patch selection and feature extraction strategies. This retrospective study included 552 archived WSIs comprising 385 well-differentiated, 102 moderately differentiated, and 66 poorly differentiated cases collected across Mayo Clinic sites in Arizona, Florida, and Minnesota. Image atlases were constructed using multiple patch aggregation strategies (Mosaic, Collage, and Montage) and deep learning encoders (KimiaNet, PathDino, and H-Optimus-0). A leave-one-WSI-out evaluation framework was used to assess differentiation classification performance using accuracy, specificity, sensitivity, and F1 score. Mosaic combined with KimiaNet achieved the highest Top-1 accuracy (74.9%) and specificity (92.6%), while Mosaic with H-Optimus-0 yielded the best Top-5 accuracy (79.0%) and macro-F1 score (62.6%). Collage combined with KimiaNet produced the highest Top-5 specificity (99.5%). The generalizability of the evaluated AI models varied across hospitals, reflecting differences in imaging protocols, staining practices, and patient populations. Overall, unsupervised image search and retrieval provides effective, annotation-free support for cSCC differentiation and has the potential to enhance dermatopathology workflows when appropriate combinations of patch selection and feature ex-traction methods are employed.

12

Detection of TP53 mutations by IHC in acute myeloid leukemia varies with interpreter expertise and mutation status

Richman, L. P.; Waller, B.; Lovitch, S. B.; Jambhekar, A.

2024-11-08 pathology 10.1101/2024.11.07.24316929 medRxiv

Top 0.1%

33.1%

Show abstract

TP53 mutations, including missense and inactivating (frameshift, splice site, and nonsense) mutations, occur in approximately 10% of myeloid neoplasms and confer adverse outcomes. Classification of myeloid neoplasms by both the World Health Organization and the International Consensus Classification standards now recognize the prognostic and therapeutic importance of early detection of TP53 mutations. p53 immunohistochemistry (IHC) is a simple and rapid method commonly used to detect p53 mutations. More recently, sequencing via targeted panels has also seen increased use. While highly accurate, sequencing is resource intensive and not universally available. IHC represents a more accessible option for mutation detection, however previous studies have demonstrated variable accuracy, especially for inactivating TP53 mutations. Using 134 bone marrow core samples of acute myeloid leukemia (AML) evaluated for TP53 mutation by a sequencing panel, we assessed the concordance of p53 IHC with sequencing as well as the inter-rater reliability for IHC intensity and percent positivity. Consistent with previous studies, we found that p53 IHC was strongly specific and modestly sensitive for missense mutations, and that overall performance improved with dedicated hematopathology training. We also found that IHC performed poorly for inactivating mutations and was even variable between cases harboring identical amino acid changes. Low predicted transcriptional activity of TP53 missense mutations correlated with a mutant pattern of IHC staining. The status of the second allele in missense mutations and variant allele fraction also affected the accuracy of p53 IHC as a surrogate for TP53 allele status. AMLs expressing p53 mutations that were predicted to have low transcriptional activity correlated with reduced overall survival. Our results demonstrate limited practical utility of p53 immunohistochemistry for accurate evaluation of TP53 mutation status due to multifactorial confounders.

13

Benchmarking pathology foundation models for non-neoplastic pathology in the placenta

Peng, Z.; Ayad, M. A.; Jing, Y.; Chou, T.; Cooper, L. A. D.; Goldstein, J. A.

2025-03-20 pathology 10.1101/2025.03.19.25324282 medRxiv

Top 0.1%

30.2%

Show abstract

Machine learning (ML) applications within diagnostic histopathology have been extremely successful. While many successful models have been built using general-purpose models trained largely on everyday objects, there is a recent trend toward pathology-specific foundation models, trained using histopathology images. Pathology foundation models show strong performance on cancer detection and subtyping, grading, and predicting molecular diagnoses. However, we have noticed lacunae in the testing of foundation models. Nearly all the benchmarks used to test them are focused on cancer. Neoplasia is an important pathologic mechanism and key concern in much of clinical pathology, but it represents one of many pathologic bases of disease. Non-neoplastic pathology dominates findings in the placenta, a critical organ in human development, as well as a specimen commonly encountered in clinical practice. Very little to none of the data used in training pathology foundation models is placenta. Thus, placental pathology is doubly out of distribution, representing a useful challenge for foundation models. We developed benchmarks for estimation of gestational age, classifying normal tissue, identifying inflammation in the umbilical cord and membranes, and in classification of macroscopic lesions including villous infarction, intervillous thrombus, and perivillous fibrin deposition. We tested 5 pathology foundation models and 4 non-pathology models for each benchmark in tasks including zero-shot K-nearest neighbor classification and regression, content-based image retrieval, supervised regression, and whole-slide attention-based multiple instance learning. In each task, the best performing model was a pathology foundation model. However, the gap between pathology and non-pathology models was diminished in tasks related to inflammation or those in which a supervised task was performed using model embeddings. Performance was comparable among pathology foundation models. Among non-pathology models, ResNet consistently performed worse, while models from the present decade showed better performance. Future work could examine the impact of incorporating placental data into foundation model training.

14

Machine-learning convergent melanocytic morphology despite noisy archival slides

Tada, M.; Gaskins, G.; Ghandian, S.; Mew, N.; Keiser, M. J.; Keiser, E. S.

2024-09-17 pathology 10.1101/2024.09.12.612732 medRxiv

Top 0.1%

29.7%

Show abstract

Melanocytic atypia, ranging from benign to malignant, often leads to diagnostic discordance, complicating its prediction by machine learning models. To overcome this, we paired H&E-stained histology images with contiguous or serial sections immunohistochemically (IHC) stained for melanocytic cells via antibodies for MelanA, MelPro, or SOX10. We developed a deep-learning pipeline to identify melanocytic atypia by digitizing a real-world archival dataset of 122 paired whole slide images from 61 confirmed melanoma in situ (MIS) cases at two institutions. Only 37.7% of the cases contained tissue pairs that matched well enough for deep learning. Nonetheless, the MelanA+MelPro models achieved an average area under the receiver-operating characteristic (AUROC) of 0.948 and an average area under the precision-recall curve (AUPRC) of 0.611, while the SOX10 models had an average of 0.867 AUROC and 0.433 AUPRC. Despite learning from biologically different IHC stains, the convolutional neural network (CNN) models independently exhibited an intuitive convergent rationale by explainable AI saliency calculations. Different antibodies, with nuclear versus cytoplasmic staining, provided complementary yet consistent information, which the CNNs integrated effectively. The resulting multi-antibody virtual stains identified morphologic cytologic and small-scale architectural features directly from H&E-stained histology images, which can assist pathologists in assessing cutaneous MIS.

15

ALPaCA: Adapting Llama for Pathology Context Analysis to enable slide-level question answering

Gao, Z.; He, K.; Su, W.; Machado, I. P.; McGough, W.; Jimenez-Linan, M.; Rous, B.; Wang, C.; Li, C.; Pang, X.; Gong, T.; Lu, M. Y.; Mahmood, F.; Feng, M.; Li, C.; Crispin-Ortuzar, M.

2025-04-22 pathology 10.1101/2025.04.22.25326190 medRxiv

Top 0.1%

29.0%

Show abstract

Large Vision Language Models (LVLMs) have recently revolutionized computational pathology. LVLMs transform pathology image embeddings into tokens recognizable by large language models, facilitating zero-shot image classification, description generation, question answering, and interactive diagnostics. In clinical practice, pathological assessments often require the analysis of entire tissue slides, integrating information from multiple sub-regions and magnification levels. However, existing LVLM frameworks have been restricted to the analysis of small, predefined regions of interest, lacking the ability to analyze pyramidal, gigapixel-scale whole-slide images (WSIs). In this work, we introduce ALPaCA (Adapting Llama for Pathology Context Analysis), and train the first general-purpose slide-level LVLM, leveraging 35,913 WSIs with curated descriptions alongside 341,051 question and answer pairs encompassing diverse diagnoses, procedures, and tissue types. By developing LongFormer, a vision-text interactive slide-level adaptor, and integrating it with a Gaussian mixture model-based prototyping adaptor, followed by training with Llama3.1, ALPaCA achieves superior performance in slide-level question answering, achieving over 90% accuracy in close-ended tests and high accuracy in open-ended questions as evaluated by expert pathologists, highlighting its potential for slide-level computer-aided diagnosis systems. Additionally, we show that ALPaCA can be readily fine-tuned on in-depth, organ-specific, or disease-specific datasets, underscoring its adaptability and utility for specialized pathology tasks.

16

Pigmented Paraganglioid Carcinoid Tumors of the Lung: Spatial Transcriptomics Reveals Shared and Distinct Features with Typical Carcinoid Tumors and Extra-Adrenal Paragangliomas

Bahmad, H. F.; Perez-Tagle-Tejeda, A.; Cisneros-Gonzalez, B. M.; Santoscoy-Valencia, R.; Alvarez-Lesmes, J.; Drews-Elger, K.; Briski, L. M.; Lora-Gonzalez, M.; Pinto, A.; Rosenberg, A. E.; Ruiz-Cordero, R.

2025-12-02 pathology 10.64898/2025.11.29.25341268 medRxiv

Top 0.1%

27.0%

Show abstract

Pigmented paraganglioid carcinoid tumors (PPCT) of the lung are a rare, underrecognized, and poorly characterized morphologic variant of pulmonary neuroendocrine tumors (NETs). While these tumors are usually diagnosed as typical carcinoid (TC) tumors, PPCT may represent a diagnostic challenge due to the histologic resemblance with extra-adrenal paraganglioma (PG). In this study, we aimed to comprehensively characterize the histomorphologic, immunophenotypic, and transcriptomic profiles of PPCT in comparison to TC and PG using spatially resolved transcriptomic analysis. Using a tissue microarray (TMA) composed of 38 tumors, including 20 TC, 16 PG, and 2 PPCT, we performed immunohistochemical (IHC) and digital spatial transcriptomic (GeoMx(R) DSP) profiling. The TMA included two punches and two regions of interest (ROIs) per case. Cellular transcriptomes were selected based on epithelial (PanCK+), sustentacular (S100+), and immune (CD45+) compartments. By IHC, PPCT retained neuroendocrine markers (synaptophysin, INSM1, chromogranin A) but showed decreased or absent pancytokeratin cocktail expression and increased number of sustentacular cells highlighted by strong expression of S100 and SOX-10, similar to PG. Expression of AE1/AE3 and CK8/18 confirmed their epithelial origin and helped distinguish them from PG. The transcriptome of PPCT clustered with that of TC but displayed distinct expression patterns in a small subset of genes. Although the sustentacular and immune compartments showed limited divergence, the epithelial compartment showed differentially expressed genes in PPCT including FABP5, MLPH, GPNMB, and SOX1, which indicate upregulation of melanocytic and neural crest markers. Gene set enrichment analysis (GSEA) revealed significant upregulation of pathways related to inflammation (e.g., TLR4-TRAF6-TAK1), PTEN trafficking, and inositol phosphate metabolism. PPCT show increased melanocytic pathway expression, which may explain the morphologic resemblance to PG.

17

Evaluating Spiking and Non-Spiking Neural Networks for Colorectal Serrated Polyp Subtype Classification

Littlefield, N.; Bao, R.; Xia, R.; Gu, Q.

2026-01-27 pathology 10.64898/2026.01.24.26344766 medRxiv

Top 0.1%

27.0%

Show abstract

Image classification on digital pathology images relies heavily on convolutional neural networks (CNNs), yet the behavior of alternative neural computing paragigms in this domain remains insufficiently characterized. Spiking neural networks (SNNs), which process information through event-driven spike-based dynamics, have recently become trainable at scale but have not been evaluated under standardized colorectal pathology benchmarks. This study presents the first controlled comparison of SNNs and CNNs on the Minimalist Histopathology Image Analysis (MHIST) Dataset, a widely used publicly available benchmark designed for reproducible evaluation of histopathology classification models released by Dartmouth-Hitchcock Medical Center. The classification task focuses on the clinically important binary distinction between hyperplastic polyps (HPs) and sessile serrated adenomas (SSAs), a challenging problem characterized by substantial inter-pathologist variability, where HPs are typically benign and SSAs represent precancerous lesions requiring closer clinical follow-up. Histologically, HPs exhibit superficial serrated architecture and elongated crypts, whereas SSAs are characterized by broad-based, often complex crypt structures with pronounced serration. A conventional ResNet-18 architecture and its spiking counterpart are evaluated under matched training and inference to isolate the effect of spiking computation. Models performance is quantified using the area under the receiver operating characteristic curve (ROC-AUC), yielding 0.817 for the conventional CNN and 0.812 for the SNN. This comparison enables a direct assessment of how spiking computation influences discriminative performance in HPs versus SSAs binary classification and provides a benchmark reference for SNNs on the MHIST dataset. The code is publicly available at https://github.com/qug125/snn-crcp.

18

ViFIT-assisted Histopathology: From H&E Style Standardization to Virtual Fiber Image Transformation

Wang, S.; Zhang, X.; Wang, X.; Lv, C.; Han, X.; Lin, X.; Kang, D.; Lin, R.; Hu, L.; Huang, F.; Liu, W.; Chen, J.

2025-01-26 pathology 10.1101/2025.01.24.634654 medRxiv

Top 0.1%

24.2%

Show abstract

Deep learning-based virtual fiber staining provides a promising complement to routine H&E pathology. However, the reliance on predefined staining style inputs and manual intervention limits the clinical applicability of existing methods. To address these challenges, we introduce ViFIT-assisted histopathology, a two-stage diagnostic approach that integrates our proposed unsupervised deep learning-based virtual fiber transformation model (ViFIT). This approach enables the conversion of H&E-stained images with diverse styles into pathologist-preferred H&E images, while simultaneously generating content-consistent virtual fiber images containing label-free collagen fibers and stained reticular and elastic fibers. ViFIT-assisted histopathology reveals tumor-associated fibers and provides quantitative metrics across multiple intraoperative and postoperative cases. Experimental results demonstrate that ViFIT significantly outperforms state-of-the-art unsupervised methods in both style standardization and virtual staining, across various downstream tasks and cancer types. By eliminating the need for staining variation and manual annotation, ViFIT-assisted histopathology streamlines histopathology workflows, making it well-suited for multi-center consultations and differential diagnosis.

19

Muddying the Waters: Syncytial Variant Nodular Sclerosis Classic Hodgkin Lymphomas Exhibit a Primary Mediastinal Large B-cell Lymphoma Like Gene Expression Profile Using the Lymph3Cx Gene Expression Profiling Assay

Cheng, J.; Gibson, S.; Barry, R.; Robetorye, R. S.

2025-05-13 pathology 10.1101/2025.05.12.25327251 medRxiv

Top 0.1%

23.7%

Show abstract

Mediastinal B-cell lymphomas are relatively frequent in young patients and include nodular sclerosis classic Hodgkin lymphoma (NSCHL), primary mediastinal large B-cell lymphoma (PMBL), and rarely, mediastinal gray zone lymphoma (MGZL). Occasional NSCHLs contain abundant Hodgkin/Reed Sternberg (HRS) cells that exhibit a syncytial growth pattern, which may create diagnostic challenges. Previous studies have demonstrated that PMBL can be distinguished from subtypes of diffuse large B-cell lymphoma (DLBCL) based on gene expression signatures using the Lymph3Cx gene expression profiling assay, which has been validated as a clinical test in our molecular diagnostics laboratory. Here, we demonstrate that syncytial variant NSCHL exhibits a PMBL-like gene expression profile using the Lymph3Cx gene expression profiling assay. It is critical for pathologists and oncologists to be aware of this potential diagnostic pitfall to avoid possible misdiagnosis.

20

Large-Scale Validation Study of an Improved Semi-Autonomous Urine Cytology Assessment Tool: AutoParis-X

Levy, J.; Chan, N.; Marotti, J.; Kerr, D.; Gutmann, E.; Glass, R.; Dodge, C.; Suriawinata, A.; Christensen, B.; Liu, X.; Vaickus, L.

2023-03-02 pathology 10.1101/2023.03.01.23286639 medRxiv

Top 0.1%

23.4%

Show abstract

Adopting a computational approach for the assessment of urine cytology specimens has the potential to improve the efficiency, accuracy and reliability of bladder cancer screening, which has heretofore relied on semi-subjective manual assessment methods. As rigorous, quantitative criteria and guidelines have been introduced for improving screening practices, e.g., The Paris System for Reporting Urinary Cytology (TPS), algorithms to emulate semi-autonomous diagnostic decision-making have lagged behind, in part due to the complex and nuanced nature of urine cytology reporting. In this study, we report on a deep learning tool, AutoParis-X, which can facilitate rapid semi-autonomous examination of urine cytology specimens. Through a large-scale retrospective validation study, results indicate that AutoParis-X can accurately determine urothelial cell atypia and aggregate a wide-variety of cell and cluster-related information across a slide to yield an Atypia Burden Score (ABS) that correlates closely with overall specimen atypia, predictive of TPS diagnostic categories. Importantly, this approach accounts for challenges associated with assessment of overlapping cell cluster borders, which improved the ability to predict specimen atypia and accurately estimate the nuclear-to-cytoplasm (NC) ratio for cells in these clusters. We developed an interactive web application that is publicly available and open-source, which features a simple, easy-to-use display for examining urine cytology whole-slide images (WSI) and determining the atypia level of specific cells, flagging the most abnormal cells for pathologist review. The accuracy of AutoParis-X (and other semi-automated digital pathology systems) indicates that these technologies are approaching clinical readiness and necessitates full evaluation of these algorithms via head-to-head clinical trials.